This report is an exploratory data analysis of weather data from SeaTac International airport in Seattle, Washington. The data covers 67 years of collected data on temperature, precipitation, and weather events such as storms and hail. The goal of this analysis is to explore the relationships between weather variables in order to find interesting and surprising correlations. Taken in the global context of climate change, this analysis also seeks evidence of any local effects.
## [1] 63 15
## [1] "Year" "AvgTemp" "MaxTemp"
## [4] "MinTemp" "YearlyPrecipitation" "AvgWind"
## [7] "DaysRain" "DaysSnow" "DaysStorm"
## [10] "DaysFog" "DaysTornado" "DaysHail"
## [13] "DaysPrecipitation" "PrecipitationPerDay" "AvgTemp.bucket"
## 'data.frame': 63 obs. of 15 variables:
## $ Year : int 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 ...
## $ AvgTemp : num 9.4 9.9 9.4 9.9 9.9 10.3 9.6 8.7 9.7 10.2 ...
## $ MaxTemp : num 14.7 15.9 14.7 15.7 15.7 15.4 14.6 13.7 15 15.4 ...
## $ MinTemp : num 5.3 5.4 5.1 5.2 5.4 6.3 5.6 4.9 5.7 6.4 ...
## $ YearlyPrecipitation: num 1170 845 1408 1033 611 ...
## $ AvgWind : num 14.3 14 16.7 16.7 15.8 15.2 17.5 19 19.7 19 ...
## $ DaysRain : int 212 175 218 163 157 227 214 210 177 189 ...
## $ DaysSnow : int 22 42 36 35 26 10 24 39 33 21 ...
## $ DaysStorm : int 20 20 10 8 4 10 6 5 4 4 ...
## $ DaysFog : int 159 137 153 144 151 150 142 162 146 153 ...
## $ DaysTornado : int 0 0 0 0 0 0 0 0 0 0 ...
## $ DaysHail : int 4 2 1 0 2 1 0 5 0 2 ...
## $ DaysPrecipitation : int 234 217 254 198 183 237 238 249 210 210 ...
## $ PrecipitationPerDay: num 5 3.89 5.54 5.21 3.34 ...
## $ AvgTemp.bucket : Factor w/ 4 levels "(8.69,10.4]",..: 1 1 1 1 1 1 1 1 1 1 ...
## Year AvgTemp MaxTemp MinTemp
## Min. :1948 Min. : 8.70 Min. :13.70 Min. :4.900
## 1st Qu.:1964 1st Qu.:10.40 1st Qu.:15.65 1st Qu.:6.300
## Median :1982 Median :11.00 Median :16.10 Median :6.700
## Mean :1981 Mean :10.87 Mean :16.15 Mean :6.643
## 3rd Qu.:1998 3rd Qu.:11.30 3rd Qu.:16.65 3rd Qu.:7.100
## Max. :2015 Max. :12.90 Max. :18.40 Max. :8.300
##
## YearlyPrecipitation AvgWind DaysRain DaysSnow
## Min. : 611.4 Min. :10.20 Min. :144.0 Min. : 3.0
## 1st Qu.: 884.8 1st Qu.:12.15 1st Qu.:177.0 1st Qu.: 8.0
## Median :1004.4 Median :13.70 Median :190.0 Median :15.0
## Mean :1008.8 Mean :13.78 Mean :188.8 Mean :16.4
## 3rd Qu.:1125.1 3rd Qu.:14.65 3rd Qu.:204.5 3rd Qu.:23.0
## Max. :1408.2 Max. :19.70 Max. :227.0 Max. :42.0
## NA's :9
## DaysStorm DaysFog DaysTornado DaysHail
## Min. : 0.000 Min. : 12.0 Min. :0.00000 Min. :0.0000
## 1st Qu.: 4.000 1st Qu.: 55.0 1st Qu.:0.00000 1st Qu.:0.0000
## Median : 6.000 Median :149.0 Median :0.00000 Median :1.0000
## Mean : 6.619 Mean :122.2 Mean :0.01587 Mean :0.8571
## 3rd Qu.: 8.000 3rd Qu.:163.0 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :20.000 Max. :186.0 Max. :1.00000 Max. :5.0000
##
## DaysPrecipitation PrecipitationPerDay AvgTemp.bucket
## Min. :153.0 Min. :3.341 (8.69,10.4]:17
## 1st Qu.:189.5 1st Qu.:4.468 (10.4,11] :17
## Median :205.0 Median :4.914 (11,11.3] :14
## Mean :205.2 Mean :4.972 (11.3,12.9]:15
## 3rd Qu.:218.5 3rd Qu.:5.412
## Max. :254.0 Max. :7.589
## NA's :9
Average temperature is approximately normally distributed
Days of rain is skewed left. Mode at 210.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 8.0 15.0 16.4 23.0 42.0
Days of snow is skewed left. Modes at 8 and 14. Median at 15. Outliers at 39 and 42.
Roughly normal, but has 3 peaks. Days of Precipitation is calculated by adding days of rain and days of snow.
Again, roughly normal.
Skewed right. Some outliers.
Clear bimodal distribution, what is going on here?
The dataset contains 63 observations in 15 variables(3 created). Each observation corresponds to a year of data from SeaTac international Airport in Seattle. The years 2002, 2005, and 2016 were N/A in the original dataset and have been removed.
AvgTemp, or Average Temperature, is the main feature of interest in the dataset.
I will be looking at average rainfall, snowfall, and wind velocity, days of storm, hail and fog as well as precipitation per year to explore the possible effects of rising temperatures.
Yes. I created 3 new variables: DaysPrecipitation, which is the number of days of snow plus the number of days of rain, Precipitation per day, which is the annual precipitation divided by the days of precipitation. I also created a Average Temperature bucket which divides each observation into the quartiles of average temperature.
Most of the distributions I examined were approximately normal. Fog has an unusual binomial distribution which will be addressed in the next section.
## [1] "Year" "AvgTemp" "MaxTemp"
## [4] "MinTemp" "YearlyPrecipitation" "AvgWind"
## [7] "DaysRain" "DaysSnow" "DaysStorm"
## [10] "DaysFog" "DaysTornado" "DaysHail"
## [13] "DaysPrecipitation" "PrecipitationPerDay" "AvgTemp.bucket"
Key correlations:
Average temperature and average wind = -.514
Average temperature and days of snow = -.708
Average yearly temperature has been increasing gradually for the past 67 years.
The number of days of rain per year has been relatively stable except for the past few years.
There has been a dramatic decrease in the number of days of snow per year.
Given the decline of both rain and snow days, it is no surprise that the combination of the two also declines.
Aside from the missing years, it appears that the amount of yearly precipitation has remained the same, although there is a large amount of variability from year to year.
As the yearly precipitation stays the same and the number of snow and rain days decreases, the amount of precipitation per precipitation event must increase
There is a clear negative correlation (-.708) between averate temperature and days of snow.
Another negative correlation (-.298).
This is a surprising find. How would average wind speed be affected by temperature?
Both of these are highly negatively correlated to temperature, and as a result are correlated to eachother.
Dashed line is y=x. Points below this line represent years with more storms than hail. This is intuitive because most hail is expected to happen within storm conditions. There is one year where hail happened twice and there was only one storm. This is an interesting phenomenon but does not call attention to a larger trend.
Here’s the source of the bimodality seen in the univariate plot section.
As the year increases, average temperature increases, with 2014 and 2015 having significant increases when compared to the previous 50 years. Some years, such as 1958 and 1992, spike but drop the next year. As the year increases, the number of days of rain and the number of days of snow decrease. This is contrasted by a consistent annual precipitation, suggesting an increase in the duration/severity of daily precipitation events. This increase is not due to more extreme weather events, as the number of days of stormy weather has remained approximately constant over time.
While examining these trends with regard to average temperature, several strong correlations were noted. The strongest relationship was between average temperature and days of snow, with a correlation of -.708, suggesting a strong decrease in snow days as temperature rises. In contrast, there is only a -.298 relationship between days of rain and temperature. Given the strong correlation between days of rain and snow and the precipitation per day feature, it is not surprising that the correlation is -.403. Another interesting correlation is between average temperature and average wind speed, with a correlation of -.551. This suggests that rising temperatures correspond with a decrease in wind speed.
Yes. Average wind speed and days of snow have a correlation of .483. This makes sense in light of earlier findings that average temperature has a strong relationship to both features, so that years with high average temperatures correspond to fewer snow days and a lower average wind speed. This relationship will be addressed further in the multivariate plots section.
Looking at days of hail vs. days of storm reveals that in all but one year there were more storms than hail events. This is an expected result, as most hail occurs during storm events. Plotting days of fog by year shows a dramatic 5-fold drop-off between 1996 and 1997. This will be explored further in the final plots section.
The strongest relationship in the data is between the average temperature and days of snow, with a correlation of -.708.
Maximum and minimum temperatures follow average temperature very closely. The dashed line shows the divergence in this following, but is fairly consistent, even in light of the general increase in temperature.
## sw$AvgTemp.bucket: (8.69,10.4]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 19.00 24.00 25.47 33.00 42.00
## --------------------------------------------------------
## sw$AvgTemp.bucket: (10.4,11]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 14.00 16.00 16.35 23.00 25.00
## --------------------------------------------------------
## sw$AvgTemp.bucket: (11,11.3]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.00 8.00 12.00 13.36 19.00 26.00
## --------------------------------------------------------
## sw$AvgTemp.bucket: (11.3,12.9]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 5.5 7.0 9.0 10.5 22.0
Days of snow broken up by temperature quartiles. The warmest temperatures occur the most recently, and have the lowest number of days of snow. Each subsequent temperature bucket has decreases in snow days across the board.
## sw$AvgTemp.bucket: (8.69,10.4]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.80 14.60 15.60 15.91 16.80 19.70
## --------------------------------------------------------
## sw$AvgTemp.bucket: (10.4,11]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.8 11.9 13.2 12.9 13.8 15.2
## --------------------------------------------------------
## sw$AvgTemp.bucket: (11,11.3]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.20 12.37 13.25 13.11 13.88 14.70
## --------------------------------------------------------
## sw$AvgTemp.bucket: (11.3,12.9]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.90 11.45 13.10 13.00 13.80 17.90
Average wind speed broken up by temperature quartiles. Compared to days of snow, the temperature buckets are not as emphatic. There is a decrease in median across the buckets but not in any other measure.
Putting the previous graphs together and taking out year provides a different look at the same data.
Same as the previous, divided into separate plots rather than color. Each increase in temperature bucket leads to increasing clustering in the lower left corner.
The more extreme values of precipitation per day are found in the higher temperature buckets. Whether this is a coincidence is unclear.
The first plot in this section compares the average min, average max, and average yearly temperatures. As expected, the minimum and maximum temperature averages closely follow the average temperature, with yearly spikes and drops showing across all three variables. The dashed line, MaxTemp - MinTemp, shows years in which this is not the case. For example, the dashed line drops in 1997, a sign that the average minimum temperature that year increased but the the average maximum did not. Aside from these minor deviations, the dashed line shows that despite gains in average temperature over time, the variance in temperature remains about the same.
The next several plots detail the relationship between average wind speed, days of snow, year, and temperature. This is done in 4 plots. The first two plots show days of snow and average wind speed respectively by year, split over average temperature buckets divided into quartiles. These plots show that both average wind speed and and days of snow decrease as year and temperature increase. They also show that the warmer the temperature, the more recent the year is likely to be. Plot 3 removes year to emphasize the relationships between average wind, snow days, and temperature. Plot 4 separates the colors of plot 3 into separate plots for clarity.
I would say the inverse relationship between average temperature and average wind speed is surprising. It is not surprising that the number of days of snow decreases as temperature rises, as snow only happens in cold weather. In contrast, it is not intuitive that wind speed should decrease with an increase in temperature. Wind is formed by differences in air pressure that result from differences in temperature. This implies that the decrease in average wind speed is the result of a smaller temperature differential. As such, it is likely that this decrease in wind speed is related not only to local temperatures, but to variable rates of temperature increases in the region following a trend of global warming. I think it is interesting that wind speed might be a local effect of global warming.
Minimum and maximum temperatures (averaged over the year) closely follow the average temperature trend. The dashed line, Maximimum - Minimum shows some deviation year to year but overall displays that the range of temperatures stays approximately the same, even as the average temperature has increased 2-3 C over 65 years.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.900 9.050 9.400 9.505 9.900 11.100
The median of Maximum - Minimum is 9.4 degrees, with a mininum difference of 7.9 and a max of 11.1. The 1st and 3rd quartiles are 9.05 and 9.9 respectively, indicating that 50% of the values fall within a <1 degree range. The histogram illustrates this, with 11.1 and 7.9 being clear outliers. The consistency of this variable is important as it signifies the consistency of variation in temperature year to year. For now at least the increase in average temperature seems not to effect this variation. This is important because increased swings in temperature are likely to have a strong impact on wild ecosystems and crop growth.
Average temperature has a strong inverse correlation to average wind speed and days of snow per year. These correlations are -.514 and -.708 respectively. This plot shows that the greater the temperature (bucket) the more clustered the points are in the lower left corner of the plot, corresponding to lower average wind speed and fewer days of snow.
Days of snow:
## sw$AvgTemp.bucket: (8.69,10.4]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 19.00 24.00 25.47 33.00 42.00
## --------------------------------------------------------
## sw$AvgTemp.bucket: (10.4,11]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 14.00 16.00 16.35 23.00 25.00
## --------------------------------------------------------
## sw$AvgTemp.bucket: (11,11.3]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.00 8.00 12.00 13.36 19.00 26.00
## --------------------------------------------------------
## sw$AvgTemp.bucket: (11.3,12.9]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 5.5 7.0 9.0 10.5 22.0
Days of wind:
## sw$AvgTemp.bucket: (8.69,10.4]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.80 14.60 15.60 15.91 16.80 19.70
## --------------------------------------------------------
## sw$AvgTemp.bucket: (10.4,11]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.8 11.9 13.2 12.9 13.8 15.2
## --------------------------------------------------------
## sw$AvgTemp.bucket: (11,11.3]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.20 12.37 13.25 13.11 13.88 14.70
## --------------------------------------------------------
## sw$AvgTemp.bucket: (11.3,12.9]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.90 11.45 13.10 13.00 13.80 17.90
Looking at the statistics above, there are decreases across the board for days of snow as average temperature (bucket) increases as well as average wind speed. Days of snow is a particularly stunning figure, as the median drops from 24 in the coldest bucket to 7 in the warmest. That the median days of snow drops by a factor of 3 with around a 3 degree change is surprising in intensity.
A massive dropoff in the days of fog by a factor of five between 1995 and 1997 raises some questions. A search for changes in fog definition and fog reporting standards returned nothing substantial. An examination of the website which provides this data found similar trends nationally -
Kansas City - http://en.tutiempo.net/climate/ws-724463.html
New York - http://en.tutiempo.net/climate/ws-744860.html
Similar trends are seen internationally, but the years of dropoff vary by country and some countries have no dropoff at all -
London - http://en.tutiempo.net/climate/ws-37720.html
Tokyo - http://en.tutiempo.net/climate/ws-476710.html
Rome - http://en.tutiempo.net/climate/ws-162390.html
The dataset analyzed contains weather information from SeaTac airport in Seattle, ranging from 1948 to 2015. The first difficulty I ran into was missing values. I decided to remove 3 years from the dataset which were missing values in all variables, but decided to keep 8 years which were missing values in yearly precipitation only. My first pass at analysis looking at univariate distributions turned up a only 1 bimodal variable, days of fog. This became an extra path of inquiry along with my original intention of looking at the relationships of many of the variables with temperature. Moving on to bivariate analysis, I found making a matrix of plots with ggpairs was essential for locating correlations within the data. It guided me to the correlation between temperature and days of snow/wind speed. It also showed me no correlations to days of fog between any of the variables, which prompted me to plot fog by year. This plot (final plot 3) instantly clarified the bimodal distribution found in the univariate analysis. It is an example of how an initially promising piece of data can be misleading, and finding an explanation for it was one of the big difficulties in my analysis. In contrast, the plots for temperature and days of snow/wind speed were compelling out of the box and represent the bulk of success in my analysis. In the multivariate analysis I had trouble finding 3 variables that had interesting relationships. To that end I created several variables including precipitation per day and temperature buckets to futher my analysis. Moving forward, the results raise some questions. Is wind speed similarly effected by temperature in other locations? What other factors might affect wind speed beside temperature that arent included in this analysis? Similar questions can be asked for days of snow as well.